Breaking the Resource Bottleneck for Multilingual Parsing
نویسندگان
چکیده
We propose a framework that enables the acquisition of annotation-heavy resources such as syntactic dependency tree corpora for low-resource languages by importing linguistic annotations from high-quality English resources. We present a large-scale experiment showing that Chinese dependency trees can be induced by using an English parser, a word alignment package, and a large corpus of sentence-aligned bilingual text. As a part of the experiment, we evaluate the quality of a Chinese parser trained on the induced dependency treebank. We find that a parser trained in this manner out-performs some simple baselines inspite of the noise in the induced treebank. The results suggest that projecting syntactic structures from English is a viable option for acquiring annotated syntactic structures quickly and cheaply. We expect the quality of the induced treebank to improve when more sophisticated filtering and error-correction techniques are applied.
منابع مشابه
An Efficient Approach for Bottleneck Resource(s) Detection Problem in the Multi-objective Dynamic Job Shop Environments
Nowadays energy saving is one of the crucial aspects in decisions. One of the approaches in this case is efficient use of resources in the industrial systems. Studies in real manufacturing systems indicating that one or more machines may also act as the Bottleneck Resource/ Resources (BR). On the other hand according to the Theory of Constraints (TOC), the efficient use of resources in manufact...
متن کاملUniversal Dependency Annotation for Multilingual Parsing
We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate...
متن کاملMultilingual Projection for Parsing Truly Low-Resource Languages
We propose a novel approach to cross-lingual part-of-speech tagging and dependency parsing for truly low-resource languages. Our annotation projection-based approach yields tagging and parsing models for over 100 languages. All that is needed are freely available parallel texts, and taggers and parsers for resource-rich languages. The empirical evaluation across 30 test languages shows that our...
متن کاملCross-Lingual Parser Selection for Low-Resource Languages
In multilingual dependency parsing, transferring delexicalized models provides unmatched language coverage and competitive scores, with minimal requirements. Still, selecting the single best parser for any target language poses a challenge. Here, we propose a lean method for parser selection. It offers top performance, and it does so without disadvantaging the truly low-resource languages. We c...
متن کاملDelexicalized transfer parsing for low-resource languages using transformed and combined treebanks
This paper describes the IIT Kharagpur dependency parsing system in CoNLL2017 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. We primarily focus on the lowresource languages (surprise languages). We have developed a framework to combine multiple treebanks to train parsers for low resource languages by a delexicalization method. We have applied transformation on the ...
متن کامل